feat(rca): generic multi-client RCA agent plugin#1
Open
ruturaj-browserstack wants to merge 12 commits into
Open
feat(rca): generic multi-client RCA agent plugin#1ruturaj-browserstack wants to merge 12 commits into
ruturaj-browserstack wants to merge 12 commits into
Conversation
…EADME Identity-only .claude-plugin/plugin.json; root .mcp.json wires the bstack MCP server (stdio); config/rca.config.json centralizes all formerly-hardcoded product/infra values (no kubectl/chitragupta/bifrost literals); /rca-build command parses build id + mode and hands off to the skill. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Port the obs-tfa-rca loop decoupled: ai-tfa-coordinator drives tfaRcaTurn to a terminal RCA (turn-cap, one-thread, soft-PENDING, digest-not-dump) with the gather mechanism routed by capability (no kubectl/chitragupta/bifrost literals). lib/routing.mjs classifies each ask skip/gather/gap against the config registry + capability manifest; the gap action is the only mode fork (auto=unavailable, interactive=ask-user). references/evidence-routing.md carries the digest format and size caps verbatim. Adds sibling pre-seed one-turn-confirm hook. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
SKILL.md orchestrator spec: mandatory GitHub intake ('I don't have one' → RCA-only;
headless missing-input fail-fast), discovery via listTestIds(failed,
includeFailureDetail), then cluster/pre-compute/fan-out/report steps.
lib/csv-state.mjs is the resumable WAL spine — seed (idempotent, terminal-
preserving), claim/heartbeat/flip, reaper, pendingRows — with timestamps injected
(workflow-sandbox-safe) and an RFC4180 codec for multiline RCA fields.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…otocol lib/signature.mjs computes signature = normalize(category|error|file) off the U1 discovery payload (folds timestamps/uuids/hex/line:col/numbers), groups rows by signature, picks a deterministic representative (non-flaky, then smallest id), and leaves signal-less rows as their own singletons. references/clustering.md documents the O(causes) protocol: representative runs the full loop; siblings pre-seed a one-turn confirm against their own logs with a fall-back-to-own-loop safeguard (never blindly inherit). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
buildManifest enumerates the client's discovered capabilities once into
capability→{available,via}, declared to the user + TFA so no evidence is asked
for that the client provably can't get. lib/evidence-cache.mjs computes the
last-green→this-build delta once and caches by (repo,range,evidenceType) — fresh
per-run Map, no module globals (multi-tenant-safe) — with resolveBaseline for the
never-green fallback. Routes the same grounded window into every coordinator.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
workflows/rca-batch.mjs orchestrates the batch in auto mode: a pipeline over clusters dispatches ai-tfa-coordinator agents — representative full loop → siblings one-turn-confirm, no barrier between stages — with a structured RCA schema. Sandbox-correct: does no state I/O itself (orchestrator passes the clustered work-list + manifest + pre-computed build evidence via args; each coordinator agent persists its own CSV row eagerly). Gap → 'unavailable' back to TFA, no user prompt. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
references/interactive-mode.md specifies the orchestrator loop: spawn ai-tfa-coordinator subagents 5 at a time; a subagent cannot pause to prompt the user, so on an evidence gap it ends early with a GAP_OUTPUT carrying resume handles (threadId+turnId); the orchestrator asks A1, then re-dispatches with resume= and the answer. Same coordinator as auto — only the gap action differs. Compact blocks not transcripts (lean main context); partial-first; auto-first/ escalate-the-residue noted. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
references/github-evidence.md specifies exactly what each github ask needs (diff-since-baseline, PRs-in-window touching the failing path, blame, deploy timing) and the discovery order GitHub MCP → gh → degrade — no shipped forensics harness. Adds the adversarial falsification protocol (path overlap / deploy-state guard / direction) so only verdict:supported suspects enter related_prs; ruled-out suspects stay as disconfirming evidence. Coordinator runs it for product_code/ deploy/ci asks, reusing the pre-computed build evidence. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… in U4) lib/coverage.mjs derives a per-row evidence-coverage band — TFA confidence capped by coverage (full keeps it, partial→medium, thin→low) so a RESOLVED built with evidence unavailable reads as lower confidence BECAUSE of the gap. lib/report.mjs renders the CSV to markdown: status counts + per-test table + coverage caveats, degrading missing fields to 'not available' and never crashing on an empty/partial batch. report-format.md documents the stamp, layout, and the startup reaper resume path. Blast-radius digest explicitly deferred. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…harness lib/loop.mjs (runRcaLoop) is an executable mirror of the coordinator loop — status branching, ask routing, gap resolution, turn-cap, one-thread, soft-PENDING — driven by an injected submit(). It doubles as the D5 sequential thin-client harness. tests/conformance.test.mjs replays recorded tfaRcaTurn transcripts (resolved/blocked/pending/turn-cap fixtures) and proves: rca capture, test_logs skip, soft-PENDING no-re-poll, turn-cap never submits a 7th turn, and the degraded (no-capability auto) path still reaches a valid terminal RCA — same loop, same result. 48 tests green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…, skip turn-cap gather Code-review fixes (suggested, non-blocking): - pending-resume removed from TERMINAL_STATES → soft-PENDING rows are now re-claimable, listed by pendingRows, and skipped by the reaper (they cleared in_flight), so the retained threadId/turnId actually drive an in-session resume instead of being stranded as a permanent non-terminal terminal. - flip() now rejects a missing/non-terminal rca_done without mutating, so a partial flip can't clear the claim yet leave the row pending (duplicate-RCA clobber). - loop checks the turn-cap BEFORE gathering, so evidence on the never-submitted final turn isn't gathered for nothing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this is
A portable Claude-Code/Cursor plugin that drives BrowserStack's collaborative
RCA loop (
tfaRcaTurn) over all failed tests of a build — generic acrossproduct and infra. It wraps the stable
bstackMCP tools (listTestIds+tfaRcaTurn) and adds the harness that batches RCA over a whole build, clustersfailures by signature, routes evidence requests to whatever skills/tools the
client already has, and records a per-test RCA.
Generalizes the product-/infra-coupled
obs-tfa-rcaskill: the loop, routing,digest, and report core are ported; the coupling points (BrowserStack discovery,
fixed
kubectl/chitragupta/bifrost,/tmp, server name) become config +runtime capability discovery.
Architecture
Three roles over the stable MCP contract:
rca-buildskill (build-level orchestrator) — mandatory pre-flight GitHubintake, discovery via
listTestIds, the CSV/WAL state spine, failure-signatureclustering, build-evidence pre-compute + capability manifest, and fan-out.
ai-tfa-coordinatoragent (per-test) — drives thetfaRcaTurnloop(turn-cap, one-thread, soft-PENDING, digest-not-dump); routes each ask by
capability (no hardcoded tools); runs suspect-PR falsification.
Two modes, one coordinator (only the injected gap-resolver differs):
workflows/rca-batch.mjsdynamic workflow (5 concurrent, no userinput; gap → "unavailable" → best-effort finalize).
a
GAP_OUTPUT(resume handles) and the orchestrator asks the user, thenre-dispatches with
resume=.Key decisions
normalize(category | first-error-line | file_path); representative runs thefull loop, siblings pre-seed a one-turn confirm against their own logs, with a
fall-back-to-own-loop safeguard (never blindly inherit).
references/github-evidence.mdspecifies theexact evidence needed; the coordinator uses GitHub MCP →
gh→ degrade.resumable;
pending-resumerows stay re-claimable.built with evidence unavailable reads as lower confidence because of the gap.
In scope: ideation #1–#5 + the v1 slice of #6 (coverage stamp), #7 (resume), #8
(conformance fixture). Deferred: #6 blast-radius digest, #8 git-forensics-MCP,
cross-session durability, Codex/Gemini orchestration parity.
Testing
npm test→node --test), dependency-free.reaper + flip-guard + pending-resume resumability, signature normalization +
clustering, evidence cache, coverage band, report renderer.
tfaRcaTurntranscripts(resolved/blocked/pending/turn-cap) through the executable loop mirror
(
lib/loop.mjs, which doubles as the sequential thin-client harness) — provingrca capture,
test_logsskip, soft-PENDING no-re-poll, turn-cap never submits a7th turn, and the degraded no-capability path still reaches a valid terminal RCA.
workflows/rca-batch.mjsfollows the documented Workflow runtime shape (meta+pipeline/parallel/agent); it's validated via the conformance fixtures andthe unit-tested libs it relies on (the runtime DSL globals can't be unit-loaded).
Install / usage
Post-Deploy Monitoring & Validation
No production/runtime impact — this is a client-side plugin (skills/agents/workflow
run
/rca-buildagainst a known red build and confirm every failed test lands aterminal CSV row + a per-test RCA. The two things to validate live: the sibling
one-turn-confirm cost win, and the "last green" baseline resolution (both have
safeguards so correctness doesn't depend on them).
🤖 Generated with Claude Code